Replication files for: Diaz, Gustavo and Erin L. Rossiter. 2025. ""Balancing Precision and Retention in Experimental Design"." Forthcoming, Political Analysis.

Gustavo Diaz
Northwestern University
gustavo.diaz@northwestern.edu

Erin L. Rossiter                                                             
University of Notre Dame                                                      
erossite@nd.edu

*** Replication instructions ***

Replication of the analyses presented in the main paper and appendix can be done by running the code files (listed below) in sequential order using the run.R script. See the codebook for more information on the data used in the analyses.

The code assumes that the working directory is set to the top-level folder of the replication archive. All file paths in the code are relative to this top-level folder.

The original analyses were done with R 4.4.0 on a Mac with an Apple M1 Max chip. The total runtime is approximately 3.5 hours. The R sessionInfo() is listed below. 

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tinytable_0.3.0      haven_2.5.4          DeclareDesign_1.0.10 fabricatr_1.0.2      randomizr_1.0.0      blockTools_0.6.4     nbpMatching_1.5.5    emmeans_1.10.3      
 [9] estimatr_1.0.4       kableExtra_1.4.0     lubridate_1.9.3      forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4          purrr_1.0.4          readr_2.1.5         
[17] tidyr_1.3.1          tibble_3.2.1         ggplot2_3.5.2        tidyverse_2.0.0     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1   viridisLite_0.4.2  farver_2.1.2       fastmap_1.2.0      TH.data_1.1-2      digest_0.6.35      rpart_4.1.23       timechange_0.3.0   estimability_1.5.1
[10] lifecycle_1.0.4    cluster_2.1.6      survival_3.5-8     magrittr_2.0.3     compiler_4.4.0     rlang_1.1.6        Hmisc_5.1-3        tools_4.4.0        yaml_2.3.8        
[19] data.table_1.17.0  knitr_1.46         htmlwidgets_1.6.4  plyr_1.8.9         xml2_1.3.6         RColorBrewer_1.1-3 multcomp_1.4-26    withr_3.0.2        foreign_0.8-86    
[28] nnet_7.3-19        grid_4.4.0         xtable_1.8-4       colorspace_2.1-0   scales_1.4.0       MASS_7.3-60.2      cli_3.6.5          mvtnorm_1.2-5      rmarkdown_2.27    
[37] generics_0.1.3     rstudioapi_0.16.0  tzdb_0.4.0         splines_4.4.0      base64enc_0.1-3    vctrs_0.6.5        Matrix_1.7-0       sandwich_3.1-0     hms_1.1.3         
[46] Formula_1.2-5      htmlTable_2.4.2    stargazer_5.2.3    systemfonts_1.1.0  glue_1.8.0         modelsummary_2.1.1 codetools_0.2-20   stringi_1.8.7      gtable_0.3.6      
[55] tables_0.9.28      pillar_1.10.2      htmltools_0.5.8.1  R6_2.6.1           evaluate_0.23      lattice_0.22-6     backports_1.4.1    Rcpp_1.0.12        svglite_2.1.3     
[64] coda_0.19-4.1      gridExtra_2.3      checkmate_2.3.1    xfun_0.44          zoo_1.8-12         pkgconfig_2.0.3   


*** Inventory of folders and files ***

run.R -- Script that executes all code in the correct order to produce run.log and all outputs in data/processed_data, figures/, and tables/.

run.log -- The log produced by run.R

Diaz and Rossiter Codebook.pdf -- Codebook describing all data files in 'data/raw_data' and 'data/external_data'

code -- Folder for R script files that replicate the analyses in the main text and appendix.
code/1_handcoding.R -- analyzes handcoding exercise
code/2_cleanreplicationdata.R -- cleans the originally collected experimental data replicating three published experiments
code/3_tappinhewitt_blockrand.R -- conducts block randomization of the Tappin and Hewitt replication
code/4_descriptives.R -- analyzes experimental data for appendix figures and tables
code/5_replication_simulation.R -- analyzes experimental data for main preregistered simulation
code/6_simulation_study.R -- conducts simulation using six published experiments' replication archives
code/7_simulation_analysis.R -- plots simulation results
code/8_simulation_nonrandom.R -- simulates and plots an simulation when sample loss is not random
code/designer.R -- custom DlecareDesign "designer" function for the simulations in '6_simulation_study.R'

data/external_data -- Folder for data files that have been downloaded from published articles' replication archives.
data/external_data/Curiel_et_al.RData -- Downloadable from Harvard Dataverse here: https://doi.org/10.7910/DVN/QR1EIR
data/external_data/Galasso_et_al.dta -- Downloadable from Harvard Dataverse here: https://doi.org/10.7910/DVN/BN1GVD
data/external_data/Goerger_et_al.csv -- Downloadable from Harvard Dataverse here: https://doi.org/10.7910/DVN/IDAIUZ
data/external_data/Lyon.csv -- Downloadable from Harvard Dataverse here: https://doi.org/10.7910/DVN/H1RSFY
data/external_data/Manekin_Mitts.rdata -- Downloadable from Harvard Dataverse here: https://doi.org/10.7910/DVN/SHHVCA
data/external_data/Simas.dta -- Downloadable from Harvard Dataverse here: https://doi.org/10.7910/DVN/IONQ8V

data/processed_data -- Folder for data files that are produced from the analyses.
data/processed_data/BayramGrahamReplication-clean.rds -- cleaned data from our replication of Bayram and Graham 
data/processed_data/diagnosis_study1.RDS -- simulation output from Galasso et al
data/processed_data/diagnosis_study2.RDS -- simulation output from Manekin and Mitts
data/processed_data/diagnosis_study3.RDS -- simulation output from Lyon
data/processed_data/diagnosis_study4.RDS -- simulation output from Goerger et al
data/processed_data/diagnosis_study5.RDS -- simulation output from Curiel et al
data/processed_data/diagnosis_study6.RDS -- simulation output from Simas
data/processed_data/DietrichHayesReplication-clean.rds -- cleaned data from our replication of Dietrich and Hayes 
data/processed_data/TappinHewitt-blockrandtreatments.csv -- output from multivariate block randomization from our replication of Tappin and Hewitt 
data/processed_data/TappinHewitt-embdata.txt -- embedded data for Qualtrics implementation of multivariate block randomization from our replication of Tappin and Hewitt, note identifiers have been omitted
data/processed_data/TappinHewittReplication-clean.rds -- cleaned data from our replication of Tappin and Hewitt 

data/raw_data -- Folder for data files that are used as inputs to the analyses.
data/raw_data/BayramGrahamReplication.rds -- raw data from our replication of Bayram and Graham, note identifiers have been omitted
data/raw_data/DietrichHayesReplication.rds -- raw data from our replication of Dietrich and Hayes, note identifiers have been omitted
data/raw_data/handcoding.csv -- data from our handcoding exercise
data/raw_data/TappinHewittReplication.rds -- raw data from our replication of Tappin and Hewitt, note identifiers have been omitted

figures -- Folder where graphs for figures are directed during the analyses.
figures/fig1.pdf -- Article Figure 1
figures/figE1.pdf -- Appendix Figure E1
figures/figE2.pdf -- Appendix Figure E2
figures/figE3.pdf -- Appendix Figure E3
figures/figE4.pdf -- Appendix Figure E4
figures/figE8.pdf -- Appendix Figure E8
figures/figG10.pdf -- Appendix Figure G10

surveyinstruments -- Folder for the Qualtrics Survey Format files for originally collected data in data/raw_data folder.
surveyinstruments/Bayram and Graham Replication.qsf -- Qualtrics implementation of Bayram and Graham
surveyinstruments/Dietrich and Hayes Replication.qsf -- Qualtrics implementation of Dietrich and Hayes
surveyinstruments/Tappin and Hewitt Replication Wave 1.qsf -- Qualtrics implementation of Wave 1 of Tappin and Hewitt
surveyinstruments/Tappin and Hewitt Replication Wave 2.qsf -- Qualtrics implementation of Wave 2 of Tappin and Hewitt
surveyinstruments/Tappin and Hewitt Replication Wave 3.qsf -- Qualtrics implementation of Wave 3 of Tappin and Hewitt

tables -- Folder where text files with information recorded in tables are directed during the analyses.
tables/tableD1.txt -- Appendix Table D1
tables/tableE2.txt -- Appendix Table E2
tables/tableE3_bottomhalf.txt -- Appendix Table E3, bottom row
tables/tableE3_tophalf.txt -- Appendix Table E3, top row
tables/tableE4_bottomhalf.txt -- Appendix Table E4, bottom row
tables/tableE4_tophalf.txt -- Appendix Table E4, top row
tables/tableE5.txt -- Appendix Table E5
tables/tableE6.txt -- Appendix Table E6
tables/tableE7.txt -- Appendix Table E7